JNCI Cancer Spectrum — Latest Matching Preprints

1

Heterogeneity of survival outcomes in ypN1 breast cancer after neoadjuvant therapy: The role of residual nodal burden in axillary de-escalation

Luz, F. A. C. d.; Araujo, R. A. d.; Araujo, L. B. d.; Silva, M. J. B.

2026-03-05 oncology 10.64898/2026.03.04.26347623 medRxiv

Top 0.1%

9.9%

Show abstract

BackgroundThe management of residual axillary disease after neoadjuvant therapy (NAT) remains controversial, as current recommendations often treat ypN1 breast cancer as a homogeneous entity despite potential prognostic heterogeneity. Evidence supporting uniform axillary surgical strategies across different levels of residual nodal burden is limited. We investigated whether survival associations related to axillary surgical evaluation differ according to residual nodal burden in ypN1 disease, using an adjuvant cohort to validate a SEER-based proxy for surgical extent. MethodsPatients with 1-3 positive lymph nodes were identified in the SEER database (2000-2022) and stratified into neoadjuvant (NAT; n=30,560) and adjuvant (AT; n=197,586) cohorts. Axillary surgical evaluation was categorized as limited (2-3 examined nodes) or extensive ([≥]10 examined nodes). Survival was analyzed using Kaplan-Meier methods and log-logistic accelerated failure-time models, adjusted with inverse probability of treatment weighting. ResultsIn the ypN1 cohort, limited axillary evaluation was not associated with inferior overall survival among patients with a single residual positive node (IPTW-adjusted HR: 1.15, p=0.134; time ratio [TR]: 0.86, p=0.184). In contrast, limited evaluation was associated with worse survival in patients with two positive nodes (HR: 1.70, 95%CI 1.54-1.87; TR: 0.58, 95%CI 0.53-0.64). The findings were similar when using breast cancer-specific survival as the endpoint. ConclusionsSurvival associations related to axillary surgical evaluation after NAT vary according to residual nodal burden. Axillary de-escalation appears feasible in patients with a single residual positive node but cannot be extrapolated to those with multiple residual nodes, underscoring heterogeneity within ypN1 disease.

2

Quantitative and qualitative patient-reported analysis of misdiagnosis and/or late diagnosis of metastatic lobular cancer

Cody, M. E.; Chang, H.-C.; Foldi, J.; Jankowitz, R. C.; Balic, M.; Cushing, T.; Donnelly, C.; Freeney, S.; Levine, J.; Petitti, L.; Ryan, N.; Spencer, K.; Turner, C.; Tseng, G. C.; Desmedt, C.; Oesterreich, S.; Lee, A. V.

2026-04-20 oncology 10.64898/2026.04.16.26348799 medRxiv

Top 0.1%

8.1%

Show abstract

BackgroundInvasive lobular breast cancer (ILC) is the most commonly diagnosed special histological subtype of breast cancer (BC). Metastatic ILC (mILC) is less sensitive to FDG-PET imaging and often metastasizes to unusual sites --peritoneum, gastrointestinal (GI) tract, ovaries, urinary tract, and orbit--which may go unrecognized after a long disease-free interval. Some metastatic sites cause nonspecific symptoms, like abdominal/epigastric pain, with numerous published case reports of mILC misdiagnosed as gastric cancer. These atypical BC metastatic sites may lead to late and/or misdiagnosis, thereby delaying effective treatments. ObjectiveWe developed a patient survey to investigate the patient-reported prevalence of delayed diagnosis or misdiagnosis of mILC and their potential impact upon treatment outcomes. MethodsA 45-question survey was developed and piloted with breast cancer researchers, clinical oncologists, and patient advocates. This IRB-approved survey was then distributed to patients with ILC. Analyses including data QC and visualization were conducted in R using descriptive statistics. Incomplete or inconsistent responses were excluded, and summary statistics were stratified by four common mILC sites to highlight subgroup differences. Results525 patient surveys were completed, with 450 patients diagnosed with ILC, and of those 321 diagnosed with mILC. For those with mILC, 33.3% (n=107) were diagnosed with de novo mILC at initial presentation. Of the patients diagnosed with mILC, 32.1% (n=103) presented with other medical conditions at diagnosis. Misdiagnosis was reported by 26.2% (n=84) of patients with mILC, and of these cases, 31% (n=26) had [≥]2 misdiagnoses. The top 5 misdiagnoses were bone-related condition (24.7%), benign breast condition (23.4%), another type of BC (7.8%), diagnostic delay (7.8%), and menopause related (5.2%). 44.5% of patients waited [≥]1 year for an accurate diagnosis. 49 patients were treated for their misdiagnosis, and 6 received incorrect cancer treatments. The most frequently reported contributors to delayed or misdiagnosis were inconclusive imaging, providers lack of ILC knowledge, and initial misdiagnosis. Of the 321 patients with mILC, 138 (42.9%) reported symptoms before diagnosis; the most common were back pain (16.5%), fatigue/malaise (14.9%), GI symptoms (11.8%), bloating (8.4%), and weight loss (8.1%). Although 40% of patients reported having a mammogram at the time of their initial misdiagnosis, ILC was detected in only 20.5% (24/116) of these cases, and mammography detected only 5 (25%) of the 20 de novo mILC cases. Patients reported additional diagnostic testing within 1-3 months of their initial mammogram, includingbiopsy, ultrasound (US), and MRI. 47.9% of patients were in active BC surveillance after curative intent therapy at the time of their mILC diagnosis; however, no statistical difference was seen in time to diagnosis versus those patients not under surveillance. ConclusionOur survey results underscore the urgent need to improve diagnostic strategies for mILC. Addressing delays and diagnostic errors in mILC is critical to optimizing treatment strategies and improving patient outcomes.

3

Retrospective cohort study extracting coexisting background breast-lesion features from stage I-III invasive breast cancer

Lim, R. J. Y.; Nitar, P.; Lau, K. W.; Leong, L. C. H.; Lim, G. H.; Tan, V. K. M.; Tan, B. K. T.; Tan, E. Y.; Goh, S. S. N.; Hartman, M.; Wong, F. Y.; Li, J.; Joint Breast Cancer Registry,

2026-05-22 oncology 10.64898/2026.05.19.26353633 medRxiv

Top 0.1%

8.0%

Show abstract

Background Background breast features are frequently noted in pathology reports alongside invasive breast cancer but rarely factor into prognosis or treatment decisions. Their relationship to tumor characteristics and patient outcomes remains incompletely characterised. Methods We conducted a retrospective cohort study of 7,603 patients with Stage I-III invasive breast cancer (diagnosed 1991-2022, age <80 years) from the Joint Breast Cancer Registry in Singapore. Natural language processing (NLP) was applied to 9,754 free-text pathology reports to extract co-existing background breast features, with accuracy validated by dual-reviewer assessment of 200 reports. Unsupervised hierarchical clustering grouped extracted features into three categories. Associations with tumor characteristics were assessed by multinomial logistic regression, and ten-year overall survival by Cox proportional hazards models (median follow-up 9.6 years; 620 deaths). Results Here we show that NLP-based extraction of background breast features from routine pathology reports achieves an accuracy of over 90% across features. Lobular neoplasia and benign proliferative changes are associated with less aggressive tumor characteristics, whereas early neoplastic and papillary lesions are more prevalent in HER2-enriched and luminal B tumor subtypes. Benign proliferative changes are associated with better survival in age- and year-adjusted models (hazard ratio 0.91, 95% CI 0.86-0.97), but this association is attenuated after adjustment for stage and subtype. Conclusions NLP-enabled extraction of background breast features from pathology text is feasible at scale. These features reflect tumor biology but do not independently add prognostic information beyond established clinical variables.

4

Macrophage spatial polarity to T cells predicts prognosis in young women with luminal breast cancer

Mezheyeuski, A.; Serna, G.; Martin-Bernabe, A.; Hekmati, N.; Zerdes, I.; Denes, A.; Fredholm, H.; Mauchanski, S.; Guardia, X.; Alonso, L.; De Mey, L.; Lahoutte, T.; Keyaerts, M.; Lindblad, J.; Sladoje, N.; Warnberg, F.; Sund, M.; Rask, G.; Wadsten, C.; Ponten, F.; Micke, P.; Fredriksson, I.; Nuciforo, P.

2026-05-24 oncology 10.64898/2026.05.17.26352909 medRxiv

Top 0.1%

7.1%

Show abstract

Purpose: The prognostic role of tumor-infiltrating lymphocytes in luminal breast cancer remains uncertain, partly because density-based metrics do not capture spatial interactions between immune cell subsets. We developed a density-independent spatial metric quantifying macrophage-T cell proximity and assessed its prognostic value. Experimental Design: Using multiplex immunohistochemistry across three breast cancer cohorts (exploratory, n = 17; discovery, n = 687; validation, n = 305), we measured nearest-neighbor distances from T cells to M1-like and M2-like macrophages, benchmarked against a randomly subsampled total macrophage pool. We defined the Macrophage Spatial Polarity Index (MSPI) as the difference between M2-to-T cell and M1-to-T cell affinity scores, where higher values reflect an M2-dominated spatial phenotype. Cox regression was used to assess associations with distant disease-free survival (discovery) and overall survival (validation). Results: M2-like macrophages preferentially localized near T cells, independent of cell density. Higher MSPI was associated with shorter survival in luminal cancers (discovery: HR = 1.45, p < 0.001), with the strongest effect in young women with early-stage disease (HR = 2.16, p < 0.0001). MSPI remained independently prognostic after adjustment for stage, systemic treatment, and diagnosis period (HR = 2.31, 95% CI 1.73-3.09, p < 0.0001) and was non-significant in HER2-positive and triple-negative subtypes. Validation in an independent ER-positive cohort confirmed the finding (HR = 1.30, p = 0.004). Pooled analysis yielded HR = 2.13 (95% CI 1.68-2.70, p = 3.45 x 10-10). Conclusions: MSPI is a robust prognostic biomarker in luminal breast cancer, particularly in young women with early-stage disease, warranting further validation for risk stratification and therapeutic guidance.

5

Histology-Derived Signatures Predict Recurrence Risk and Chemotherapy Benefit in Randomized Trials of Early Breast Cancer

Howard, F. M.; Li, A.; Kochanny, S.; Sullivan, M.; Flores, E. M.; Dolezal, J.; Khramtsova, G.; Hassan, S.; Medenwald, R.; Saha, P.; Fan, C.; McCart, L.; Watson, M.; Teras, L. R.; Bodelon, C.; Patel, A. V.; Symmans, W. F.; Partridge, A.; Carey, L.; Olopade, O. I.; Stover, D.; Perou, C.; Yao, K.; Pearson, A. T.; Huo, D.

2026-04-24 oncology 10.64898/2026.04.23.26351499 medRxiv

Top 0.1%

7.1%

Show abstract

PurposeTo test whether histology-derived gene-expression signatures from routine hematoxylin and eosin slides are prognostic for recurrence and predictive of chemotherapy benefit in early breast cancer. MethodsWe conducted a multi-cohort study including CALGB 9344 (anthracycline {+/-} paclitaxel), CALGB 9741 (standard vs dose-dense chemotherapy), a pooled Chicago real-world cohort, and the American Cancer Society (ACS) Cancer Prevention Studies-II and -3. Whole-slide images were processed with a previously described pipeline to generate 61 histology-derived signatures per patient. The primary endpoint was distant recurrence-free interval (DRFI), except in ACS, where breast cancer-specific survival was used. Secondary endpoints include distant recurrence-free survival (DRFS) and overall survival. The most prognostic signature in CALGB 9344, selected by Harrells C-index, was evaluated in additional cohorts. Signature-treatment interaction was assessed by likelihood-ratio tests. Multivariable Cox models incorporating age, tumor size, nodal status, estrogen/progesterone receptor status, and signature were fit in CALGB 9344 to improve risk stratification. ResultsA total of 7,170 patients were included across four cohorts. The top histology-derived signature in CALGB 9344 showed strong prognostic performance for 5-year DRFI (C-index 0.63) and performed well across validation cohorts (C-index 0.60, 0.70, and 0.62 in CALGB 9741, Chicago, and ACS, respectively). The strongest predictive signal for treatment benefit was observed for DRFS. High-risk cases identified by the signature demonstrated greater benefit from taxane in CALGB 9344 (adjusted hazard ratio [aHR] 0.76 for DRFS, 95% CI 0.66-0.88; interaction p=0.028), from dose-dense chemotherapy in CALGB 9741 (aHR 0.69, 95% CI 0.56-0.85; interaction p=0.039), and differential chemotherapy benefit in the Chicago cohort (aHR 0.84, 95% CI 0.59-1.21; interaction p=0.009). Combined clinical-histology models improved risk stratification and identified low-risk groups with a 2%-10% risk of distant recurrence or breast cancer death. ConclusionHistology-derived signatures from H&E images are broadly prognostic and, unlike clinical factors, may predict chemotherapy benefit. HighlightsO_LIHistology-derived H&E signatures consistently predicted recurrence risk across randomized trials and real-world cohorts. C_LIO_LIA single cutoff of a low-risk histology signature predicted taxane benefit and dose-dense chemotherapy benefit. C_LIO_LICombined clinical-histology models identified low-risk groups with 2%-10% risk of distant recurrence. C_LI

6

Racioethnic Disparities in Risk of Cardiometabolic Risk Factors and Cardiovascular Disease among Women Treated for Breast Cancer: The Pathways Heart Study

Yao, S.; Zimbalist, A.; Sheng, H.; Fiorica, P.; Cheng, R.; Medicino, L.; Omilian, A.; Zhu, Q.; Roh, J.; Laurent, C.; Lee, V.; Ergas, I.; Iribarren, C.; Rana, J.; Nguyen-Huynh, M.; Rillamas-Sun, E.; Hershman, D.; Ambrosone, C.; Kushi, L.; Greenlee, H.; Kwan, M.

2026-04-24 epidemiology 10.64898/2026.04.23.26351612 medRxiv

Top 0.1%

6.9%

Show abstract

Racial and ethnic disparities exist in cardiovascular disease (CVD) burden in the general population; yet surprisingly few studies have examined such disparities in breast cancer patients, who are at higher risk due to cardiotoxic therapy. To investigate incidence of CVD and cardiometabolic risk factors across Asian, non-Hispanic Black (NHB), Hispanic, and non-Hispanic White (NHW) women with a history of breast cancer. In 4,071 women with breast cancer from a prospective cohort, the incidence of cardiometabolic risk factors and CVD occurring after breast cancer diagnosis were analyzed with self-identified race and ethnicity (SIRE) and global genetic ancestry. Racial and ethnic differences existed in the prevalence of cardiometabolic risk factors and CVD before breast cancer diagnosis, which continued to manifest in incident cases after cancer treatment. Asian, NHB, and Hispanic women were all at higher risk of diabetes than NHW women. Nonetheless, only NHB women had higher risk of CVD events, and Hispanic women were at lower risk. The apparent lower risk of CVD in Asian women largely disappeared after adjustment for covariates. Similar differences across SIRE groups were found in the cardiotoxic chemotherapy subgroup and the subgroup without chemotherapy, except for any CVD and VTE showing modifying effects of cardiotoxic chemotherapy. Analyses of genetic ancestry revealed similar results to SIRE. Our study reveals racial and ethnic disparities in cardiometabolic risk factors and CVD events before and after breast cancer diagnosis. Clinical and research attention is warranted to bridge the population-level gaps in CVD morbidity and mortality. Statement of SignificanceOur study provides strong evidence for racial and ethnic disparities in cardiovascular disease before and after breast cancer diagnosis. Clinical and research attention is warranted to bridge these population-level gaps.

7

Development and Validation of a Multimodal Clinical, Pathologic, and Genomic Model for Breast Cancer Recurrence

Nguyen, N.-K.; Li, A.; Kochanny, S.; Dolezal, J.; Ramesh, S.; Shamai, G.; Zhao, J.; Nanda, R.; Chen, N.; Olopade, O. I.; Sullivan, M.; Flores, E. M.; Khramtsova, G.; Jain-Liu, S.; Medenwald, R.; Saha, P.; McCart, L.; Watson, M.; Symmans, W. F.; Kalinsky, K.; Pusztai, L.; Gala, M.; Paul, E. D.; Huraiova, B.; Cekan, P.; Partridge, A. H.; Carey, L.; Stover, D.; Yao, K.; Sparano, J. A.; Huo, D.; Pearson, A. T.; Howard, F. M.

2026-05-12 oncology 10.64898/2026.05.08.26352562 medRxiv

Top 0.1%

6.8%

Show abstract

PurposeTo develop and validate a multimodal recurrence-risk model integrating histology, genomic testing, and clinical variables. MethodsWe developed AI-Path, a whole-slide image biomarker for recurrence prediction trained in CALGB 9344, and validated it in three independent cohorts: TAILORx, a multi-site Chicago cohort, and the MDX-BRCA cohort. We then integrated AI-Path with Oncotype DX Recurrence Score (RS), tumor size, and nodal status into a Cox model, PathClinRS, fit using 60% of cases from TAILORx, with the remaining 40% held out for validation. The primary end point was distant recurrence-free interval. Performance was assessed using Harrells concordance index (C-index) and Kaplan-Meier analyses. ResultsA total of 12,418 patients were included. In TAILORx, AI-Path outperformed RS for distant recurrence (C-index, 0.682 vs 0.647; P = .038), driven by superior prediction of late recurrence (0.656 vs 0.567; P < .001). In node-negative disease, PathClinRS outperformed RSClin in the TAILORx fitting (0.72 vs 0.70; P = .016) and validation sets (0.74 vs 0.70; P = .004). In node-positive disease, PathClinRS outperformed RSClinN+ in Chicago (0.94 vs 0.74; P < .001) and MDX-BRCA (0.71 vs 0.66; P = .004) cohorts. Compared with NATALEE eligibility, PathClinRS identified nearly twice as many high-risk node-negative patients while maintaining a comparable 10-year distant recurrence risk (16.7% vs 16.6% per NATALEE eligibility in TAILORx fitting; 21.0% vs 19.4% in TAILORx validation). PathClinRS identified 68% of intermediate risk premenopausal patients as low-risk with no evidence of chemotherapy benefit, compared to only 36% identified as low risk by standard clinicopathologic criteria. ConclusionDigital histopathology provides prognostic information complementary to genomic assays and has the potential to personalize therapy beyond existing clinicogenomic tools.

8

Artificial Intelligence and Circulating microRNA Signatures for Early Breast Cancer Detection: A Systematic Review and Meta-Analysis

Solanki, s.; Solanki, N.; Prasad, J.; Prasad, R.; Harsulkar, A.

2026-03-30 oncology 10.64898/2026.03.29.26349657 medRxiv

Top 0.1%

6.8%

Show abstract

Background: Early breast cancer detection remains central to improving clinical outcomes, yet conventional screening pathways, particularly mammography, have recognized limitations in sensitivity, specificity, and performance in dense breast tissue. Circulating microRNAs (miRNAs) have emerged as promising minimally invasive biomarkers, while artificial intelligence and machine learning (AI/ML) offer powerful tools for identifying diagnostically relevant multi-marker patterns within complex biomarker datasets. This systematic review and meta-analysis evaluated the diagnostic performance of AI/ML-based circulating miRNA signatures for early breast cancer detection. Methods: A systematic search of PubMed/MEDLINE, Scopus, and Web of Science Core Collection was conducted from database inception to 31 December 2025. Studies were eligible if they were original human investigations evaluating circulating miRNAs using an AI/ML-based diagnostic model for breast cancer detection and reporting extractable diagnostic performance metrics. Study selection followed PRISMA 2020 and PRISMA-DTA guidance. Methodological quality was assessed using QUADAS 2. Pooled sensitivity and specificity were synthesized using a bivariate random-effects model, and overall diagnostic performance was summarized using a hierarchical summary receiver operating characteristic framework. Results: Seven studies met the inclusion criteria for qualitative synthesis, with eligible studies contributing to the quantitative analysis depending on data availability. Across the pooled analysis, AI/ML-based circulating miRNA models demonstrated good overall diagnostic performance, with a pooled AUC of 0.905 (95% CI: 0.890 to 0.921), pooled sensitivity of 81.3% (95% CI: 76.8% to 85.2%), and pooled specificity of 87.0% (95% CI: 82.4% to 90.7%). Heterogeneity was moderate for AUC (I2 = 42.3%) and sensitivity (I2 = 38.7%) and low for specificity (I2 = 28.4%). Risk-of-bias assessment showed overall low-to-moderate methodological concern, with patient selection representing the most variable domain. Deeks funnel plot asymmetry test showed no significant evidence of publication bias (p = 0.34). Conclusions: AI/ML based circulating miRNA signatures show promising diagnostic accuracy for early breast cancer detection and may have value as non invasive adjunctive tools within imaging supported diagnostic pathways. However, the evidence base remains limited by methodological heterogeneity, variable validation rigor, and the predominance of retrospective case control designs. Prospective, standardized, and externally validated studies are needed before routine clinical implementation can be justified.

9

The tumour microenvironment influences long-term tamoxifen benefit in postmenopausal ER+/HER2- breast cancer patients.

Camargo Romera, P.; Castresana Aguirre, M.; Danielsson, O.; Dar, H.; Ostman, A.; Czene, K.; Lindstrom, L. S.; Tobin, N. P.

2026-03-26 oncology 10.64898/2026.03.24.26349151 medRxiv

Top 0.1%

6.7%

Show abstract

BackgroundThe tumour microenvironment (TME) influences breast cancer progression and treatment response. We investigated whether TME composition predicts tamoxifen benefit in postmenopausal women with oestrogen receptor-positive, HER2-negative (ER+HER2-) breast cancer. MethodsThis study included 513 patients from the Stockholm Tamoxifen (STO-3) trial, which randomised postmenopausal, lymph node-negative women to tamoxifen or no endocrine therapy. Bulk tumour transcriptomes were deconvoluted with the ConsensusTME algorithm to estimate the relative abundance of 18 immune and stromal cell types. A summary score of combined immune cells was created on a per patient basis and evaluated alongside fibroblast and endothelial stromal compartments. Patients were categorised into immune and stromal tertiles on the basis of these scores. Associations between TME composition and tumour characteristics were evaluated using Spearman correlations and Fishers exact test. Tamoxifen benefit was analysed by univariable Kaplan-Meier (log-rank) and multivariable Cox proportional hazards adjusting for age, tumour size, grade, progesterone receptor, Ki-67, and radiotherapy. Differential expression was assessed with limma and pathway enrichment with fgsea using Hallmark gene sets from MSigDB. ResultsLow immune abundance was significantly associated with higher ER expression (Fishers exact test p < 0.001). Among tamoxifen-treated patients, those with low immune scores showed improved distant recurrence-free interval (DRFI) relative to untreated patients (log-rank p < 0.001). Similarly, intermediate endothelial (p < 0.001) and low/intermediate fibroblast abundances (p = 0.042, p = 0.009) were associated with favourable DRFI. In multivariable models, low immune (aHR = 0.17, 95% CI 0.08-0.40), intermediate endothelial (aHR = 0.21, 95% CI 0.09-0.51), and low/intermediate fibroblast tertiles (aHR = 0.50, 95% CI 0.27-0.93; aHR = 0.36, 95% CI 0.17-0.77) retained significance. Transcriptomic analysis revealed enrichment of oestrogen-response, MYC-target, and oxidative-phosphorylation pathways in low-immune and low-fibroblast tumours, while interferon-{gamma} response and allograft rejection pathways were downregulated. ConclusionsTME composition modulates tamoxifen benefit in postmenopausal ER+HER2-breast cancer. Low immune, intermediate endothelial, and low/intermediate fibroblast abundances are associated with improved benefit from tamoxifen, suggesting that both immune and stromal compartments influence endocrine treatment efficacy.

10

Integrated Multi-Omics Analysis for the Identification of Disease-Associated Variations and Prognostic Biomarkers in Triple-Negative Breast Cancer (TNBC)

MANNEKUNTA, N.; NATRAJAN, E.

2026-05-06 bioinformatics 10.64898/2026.05.03.722461 medRxiv

Top 0.1%

6.3%

Show abstract

BackgroundTriple-negative breast cancer (TNBC) exhibits substantial molecular heterogeneity and lacks targeted receptor therapies. Single-omic approaches inadequately capture its regulatory complexity, necessitating integrated multi-omic frameworks to identify stable prognostic signatures. MethodsMatched transcriptomic and DNA methylation data from the TCGA-BRCA cohort were normalised and mathematically integrated to isolate disease-associated variations. A calibrated machine learning voting ensemble (comprising LightGBM, Random Forest, and Logistic Regression) was trained to predict clinical survival. Model generalisability was tested on an independent microarray cohort (GSE58812) using independent quantile normalisation. SHAP (SHapley Additive exPlanations) values provided biological interpretability. ResultsDifferential and integrative analyses identified a 47-gene master prognostic signature. The ensemble classifier achieved an external validation accuracy of 74.77% (AUC 0.590) on unseen clinical patients. SHAP analysis confirmed the biological directionality of these specific biomarkers in driving mortality. Hypergeometric pathway enrichment highlighted targetable metabolic and signalling networks. ConclusionsThis multi-omic machine learning pipeline identifies a highly prognostic 47-gene signature for TNBC. The model demonstrates strong cross-platform generalisability and offers interpretable clinical utility for stratifying patient risk and guiding future therapeutic target development.

11

CBFB mutations predict endocrine therapy benefit in estrogen receptor-positive breast cancer

Yaacov, A.; Passi, G.; Gillis, R.; Katz, D.; Grinshpun, A.

2026-05-21 oncology 10.64898/2026.05.18.26353467 medRxiv

Top 0.1%

6.2%

Show abstract

Purpose: Beyond estrogen receptor (ER) positivity, no genomic biomarker reliably identifies ER+ breast cancer patients who derive differential benefit from endocrine therapy (ET). We performed an unbiased genomic screen to discover genes predicting ET response and characterized the top candidate across clinical settings, treatment modalities, and an independent validation cohort. Experimental Design: We screened 240 genes in 1,197 metastatic ET-treated patients from the MSK-CHORD clinical genomics database using Cox proportional hazards regression with false discovery rate (FDR) correction. The top candidate, core-binding factor subunit beta (CBFB), was characterized across four cohorts defined by disease setting (metastatic/adjuvant) and treatment (ET/chemotherapy), with multivariable adjustment, gene-by-treatment interaction testing, left-truncation sensitivity analysis for guarantee-time bias, and external validation in METABRIC (N = 1,499 ER+). Results: CBFB mutations (prevalence, ~5%) were the only gene associated with improved time to progression (TTP). In metastatic ET patients, CBFB-mutated tumors (n = 80) demonstrated significantly longer TTP (hazard ratio [HR], 0.44; 95% CI, 0.29-0.67; P = .0002, FDR q = .010) with no chemotherapy benefit (HR, 1.16; P = .65). The gene-by-treatment interaction was significant (HR, 0.37; P = .009). Effects were robust to multivariable adjustment (HR, 0.46-0.50), independent of histology, and preserved under left-truncated Cox regression (HR, 0.38). In the adjuvant setting, CBFB mutations predicted improved recurrence-free survival (HR, 0.52; 95% CI, 0.31-0.85; P = .010), with no effect under chemotherapy. In METABRIC, CBFB mutations predicted improved ER+ overall survival (HR, 0.52; P = 9.3e-5). Conclusions: CBFB mutations identify ~5% of ER+ breast cancers with exceptional ET benefit. As CBFB is included on all major cancer gene panels, this biomarker requires no additional testing infrastructure for clinical implementation.

12

Incidental Non-Breast Malignancies in a Consecutive Forensic Autopsy Cohort: Secondary Findings from the Sisyphus Study

Sidiropoulou, Z.; Santos, C.

2026-05-06 oncology 10.64898/2026.05.05.26352437 medRxiv

Top 0.1%

5.1%

Show abstract

BackgroundForensic autopsy cohorts can help estimate the burden of clinically unrecognised cancer that is not captured by routine incidence statistics. We report incidental non-breast malignancies identified as secondary findings in the Sisyphus Study, a prospective forensic autopsy cohort originally established to investigate silent breast cancer prevalence. MethodsThis was a descriptive secondary analysis of 291 consecutive medicolegal autopsies performed in Lisbon, Portugal, between July 2016 and December 2019 (74 male and 217 female decedents). Key exclusions relevant to the present analysis were age below 40 years, major breast-region injury, and known or clinically evident cancer. An incidental cancer was defined as a histologically confirmed malignancy identified at autopsy in an individual without a prior clinical cancer diagnosis. ResultsFifteen incidental non-breast malignancies were identified among 291 decedents, yielding an overall prevalence of 5.15%. Prevalence was 6.76% in males (5/74) and 4.61% in females (10/217). Male findings comprised two colorectal adenocarcinomas, one pancreatic metastatic adenocarcinoma, one gastric adenocarcinoma, and one splenic lymphoma. Female findings comprised six colorectal adenocarcinomas, two lung adenocarcinomas, one perforated gastric adenocarcinoma, and one ovarian metastatic adenocarcinoma. Colorectal malignancies accounted for 8 of 15 cases (53.3%). Metastatic disease was documented in at least five cases, and perforation was present in two gastrointestinal tumours. None of the affected individuals had a prior cancer diagnosis during life. ConclusionsThis cohort demonstrates a measurable burden of clinically silent non-breast cancer, including advanced and potentially fatal disease. Forensic autopsy surveillance may complement conventional cancer surveillance by identifying malignancies that remain invisible to clinical registries. The predominance of colorectal cancer in this series is consistent with missed opportunities for earlier detection, although individual screening histories were unavailable.

13

Genetic prediction of long-term effects of aromatase inhibition on cancer and non-neoplastic disease risk

Ray, D.; Bate, T.; O'Mara, T. A.; Sasieni, P.; Gunter, M. J.; Martin, R. M.; Smith-Byrne, K.; Haycock, P.; Yarmolinksy, J.

2026-04-29 epidemiology 10.64898/2026.04.28.26351848 medRxiv

Top 0.1%

4.8%

Show abstract

BackgroundAnastrozole, an aromatase inhibitor, is approved for breast cancer prevention in high-risk women. The long-term effects of aromatase inhibition, including its repurposing potential to other cancers, possible adverse effects, and treatment effect heterogeneity across patient subgroups, remain unclear. MethodsWe used the rs727479 variant in CYP19A1 to mimic the effect of long-term pharmacological aromatase inhibition. To evaluate repurposing opportunities, genetic association data on five cancers (211,386 cases, 684,665 controls) were obtained from genome-wide association study consortia. Potential adverse effects were evaluated in a phenome-wide association study (PheWAS) of 449 health-related traits in 162,360 postmenopausal women in the UK Biobank. Effects were investigated across clinically relevant subgroups in the UK Biobank including those defined by body mass index (BMI). ResultsGenetically-proxied aromatase inhibition was associated with reduced risk of ER+ breast cancer (OR:0{middle dot}78, 95%CI:0{middle dot}67-0{middle dot}92) and decreased heel bone mineral density (-0{middle dot}32SD change, 95%CI:-0{middle dot}36,-0{middle dot}28). When examining the repurposing potential of anastrozole to other cancers, we found that genetically-proxied aromatase inhibition reduced endometrial cancer risk (OR:0{middle dot}34, 95%CI:0{middle dot}26-0{middle dot}44). In PheWAS, genetically-proxied aromatase inhibition was associated with 6 outcomes (PFDR<0{middle dot}05) including reduced risk of endometrial polyps (OR:0{middle dot}58, 95%CI:0{middle dot}45-0{middle dot}74) and postmenopausal bleeding (OR:0{middle dot}67, 95%CI:0{middle dot}54-0{middle dot}83), with stronger effects in women with higher BMI (PLRT=1{middle dot}26x10-3 and 0{middle dot}02, respectively). ConclusionOur genetic analyses recapitulate known effects of aromatase inhibition on breast cancer risk and highlight potential repurposing for endometrial cancer prevention. Limited evidence of adverse effects beyond bone mineral density was observed, and subgroup analyses suggested that women with higher BMI may experience greater protection against endometrial conditions.

14

MOSAIC: Explainable AI for Reproducible Histologic Grading and Prognostic Stratification in Breast Cancer

Sonpatki, P.; Gupta, S.; Biswas, A.; Patil, S.; Tyagi, S.; Balakrishnan, L.; Mistry, H.; Doshi, P.; Jagadale, K.; Shelke, P.; Parikh, L.; Shah, M.; Bharadwaj, R.; Desai, S.; Kulkarni, M.; Koppiker, C. B.; Prabhu, J.; Kachchhi, U.; Shah, N.

2026-03-18 pathology 10.64898/2026.03.11.26348043 medRxiv

Top 0.1%

4.8%

Show abstract

Nottingham histologic grading is essential for breast cancer prognostication but suffers from inter-observer variability in assessing mitotic activity, nuclear pleomorphism, and tubule formation. We developed MOSAIC (Mammary Oncology Spatial Analysis and Intelligent Classification), an explainable AI framework designed to perform component-wise grading by independently modeling these three histologic features. Model outputs were calibrated using a two-phase pathology study to establish clinically reproducible scoring thresholds and were subsequently evaluated across public datasets and multi-institutional Indian cohorts. MOSAIC demonstrated robust performance, with AI-derived grades providing independent prognostic information (HR >= 1.8 in two datasets, p = < 0.001) and improved survival stratification compared to traditional methods. In pathologist calibration studies, AI-assisted scoring significantly reduced variability, specifically achieving near-perfect agreement in mitotic scoring with a weighted {kappa} up to 0.98. Accuracy and Cohens kappa ({kappa}) analysis further characterized the models technical performance across components: Tubule formation showed the highest agreement (Accuracy >= 0.6607, {kappa} = 0.549), followed by overall Grade (Accuracy = 0.5637, {kappa} = 0.539) and Mitotic activity (Accuracy = 0.4985, {kappa} = 0.4), while Nuclear pleomorphism proved the most challenging (Accuracy = 0.3303, {kappa} = 0.271). Comparative survival models confirmed that AI-derived grades were more significant predictors of risk than manual pathologist-assigned grades, with the AI model yielding a superior global p-value (5.9 x 10-7) and lower AIC (769.61). These results indicate that MOSAIC enables reproducible, interpretable grading by decomposing assessment into pathology-aligned components. By enhancing consistency while preserving prognostic relevance, this framework supports explainable AI as a viable assistive tool for routine breast cancer pathology.

15

A Cross-Cohort Validated Plasma Lipid Biomarker Assay for Early Breast Cancer Detection Using Machine Learning

Huang, T.; Koch, F. C.; Peake, D. A.; Adam, K.-P.; David, M.; Li, D.; Heffernan, K.; Lim, A.; Hurrell, J. G.; Preston, S.; Baterseh, A.; Vafaee, F.

2026-04-23 oncology 10.64898/2026.04.23.26351564 medRxiv

Top 0.1%

4.5%

Show abstract

Early detection of breast cancer remains essential for improving clinical outcomes, and complementary non-invasive approaches are needed to support existing screening methods, particularly for women with dense breast tissue. We have previously reported plasma lipid biomarker discovery using untargeted high-resolution liquid chromatography tandem mass spectrometry (LC-MS/MS). In this study, we performed biomarker confirmation and developed machine-learning models applied to targeted plasma lipid measurements for the non-invasive detection of early-stage breast cancer across international cohorts with independent external validation. Targeted LC-MS/MS was used to quantify candidate lipid panels in plasma samples from European discovery cohorts (n = 554) and an independent Australian cohort (n = 266) used for external validation. Data-driven feature selection identified a 15-lipid panel with strong performance in European cohorts (AUC [≥] 0.94). External validation prior to confidence stratification yielded 76% sensitivity, 64% specificity, and an AUC of 0.81 in the Australian validation cohort. Clinical assay development requires iterative panel and model testing to support translational feasibility and performance in the intended-use population. An analytically viable panel, excluding lipids requiring complex and costly synthesis, achieved comparable accuracy with improved assay robustness. Confidence-based analysis showed enhanced performance for predictions made with moderate to high confidence, with sensitivity up to 89% and AUC up to 0.85, suggesting that ongoing research should focus on strategies to enhance diagnostic model confidence. Importantly, model predictions were independent of breast density, tumour size, grade, subtype, and morphology, indicating biological specificity of the lipid signature. These results demonstrate that calibrated machine-learning models applied to plasma lipid biomarkers can support non-invasive breast cancer detection. Expanding training datasets to include greater diversity will further improve performance in the ongoing development of this lipid-based detection approach.

16

Connecting Baseline Immune Exhaustion in Hot Tumors to Oral Cancer Recurrence and Nodal Metastasis

Shaikh, S.; Basu, S.; Hajihosseini, M.; Nandy, S. K.; Moorthy, M.; Arun, I.; Lali, B. S.; Arun, P.; Mukherjee, G.; Pyne, S.

2026-05-30 oncology 10.64898/2026.05.27.26354295 medRxiv

Top 0.1%

4.3%

Show abstract

Background: The use of immune checkpoint inhibitors (ICIs) in the treatment of cancer has rapidly expanded over the last decade. However, there are several knowledge gaps in understanding how tumor cells evade the immune system. There is paucity of data in HPV negative oral cancer, particularly of the gingivobuccal region. Understanding the mechanism of immune system evasion in this cancer is vital for improving patient outcomes. Methods: We characterized the baseline immune milieu of oral cancer using immunohistochemistry (IHC) on whole tumor sections from 124 cases. Tumors were classified as hot or cold and further stratified into high-risk and low-risk groups. High-risk patients included those with lymph node metastasis at diagnosis/recurrence or distant metastasis within 2 years of treatment completion. Patients without these features were categorized as low risk. Validation by RNA-Seq and Joint Enrichment Analysis of Oncogenic and Immunologic Pathways was carried out in a subset of 46 cases. Results: Hot high-risk tumors (by IHC) were distinguished by elevated PD-L1 expression and reduced NK-cell, PD1, and CTLA-4 expression. There was no difference in the expression levels of CD3+, CD8+, granzyme, or perforin compared to hot low-risk tumors, findings that align with the definition of hot tumors. RNA-Seq revealed a gene signature associated with exhausted T-cells in hot high-risk tumors. Gene and pathway analyses identified differential upregulation of isoform-specific TOX, TCF, CXCR, RUNX, IRF, BRD and BCL6 genes, implicating immune cell exhaustion and tumor aggressiveness. Significantly downregulated genes included PDCD1, HAVCR2, ZAP70, and STAT, indicative of a disabled immune microenvironment. These findings support that a state of immune exhaustion in HHR tumors is driven by progenitor exhausted T-cells and terminally exhausted T-cells; independent of PD1-TIM3. Conclusion: These findings suggest that combining TOX/TCF/BCL6 inhibitors with immune checkpoint inhibitors in the adjuvant setting might benefit patients with hot high-risk tumors. Given the results, testing for a targeted exhaustion-related gene panel at diagnosis is recommended for oral cancers to stratify tumors as high-risk or low-risk. Larger validation studies and clinical trials are now warranted.

17

A clinical pilot study for personalized risk?based breast cancer screening utilizing the polygenic risk score

Hovda, T.; Sober, S.; Padrik, P.; Kruuv-Kao, K.; Grindedal, E. M.; Vamre, T. B. A.; Eikeland, E.; Hofvind, S.; Sahlberg, K. K.

2026-03-16 radiology and imaging 10.64898/2026.03.07.26347839 medRxiv

Top 0.1%

4.1%

Show abstract

BackgroundPopulation-based mammographic screening is primarily age-based. However, breast cancer risk is multifactorial, and women may benefit from personalized risk-based screening. This pilot study aimed to explore the use of polygenic risk score (PRS) as a tool for risk stratification in personalized screening. MethodsWe included 80 women aged 40-49 years referred for clinical mammography. Exclusion criteria were prior breast cancer or premalignant breast disease, and previous genetic testing. After DNA collection, PRS was calculated from 2805 Single Nucleotide Polymorphisms (SNPs). Screening recommendations were based on each participants relative 10-year breast cancer risk estimated from PRS and compared with the 10-year risk of an average woman of the same age. Women with a self-reported family history of cancer meeting standard criteria were referred for gene panel testing for pathogenic variants in high-risk genes. A follow up questionnaire regarding participants experiences was distributed 6-9 months after PRS testing. ResultsMean age was 45.2 years (SD 2.8). Mean relative 10-year breast cancer risk was 1.18 (SD 0.57). Based on PRS, 40 participants were recommended standard biennial screening 50-69 years, while 40 were advised to begin biennial screening before age 50. Among these, 7 were recommended annual mammography from when their 10-year risk reached twice that of an average 50-year-old. Twenty-one women underwent gene panel testing; no pathogenic variants in breast cancer genes were identified. Five women were advised annual mammography from 40-60 years due to family history of breast cancer, regardless of PRS. Most respondents viewed breast cancer risk assessment positively and did not report increased anxiety after testing. ConclusionsPolygenic risk score testing may influence current screening recommendations and contribute to more personalized risk-based breast cancer screening strategies.

18

Semaglutide is associated with improved breast cancer survival, lower metastatic burden, and a dose-survival relationship uncoupled from weight-loss magnitude

Murugadoss, K.; Venkatakrishnan, A. J.; Soundararajan, V.

2026-04-24 oncology 10.64898/2026.04.23.26351609 medRxiv

Top 0.1%

3.9%

Show abstract

Metabolic dysfunction is increasingly recognized as a risk factor for poor outcomes in breast cancer, but whether incretin-based therapies confer survival benefit beyond weight loss remains unresolved. Using a federated electronic health record platform spanning nearly 29 million patients, we evaluated breast cancer survival after semaglutide and tirzepatide initiation in routine care. In 1:1 propensity-matched pooled-comparator analyses, semaglutide was associated with improved overall survival versus metformin, sodium-glucose cotransporter 2 (SGLT2) inhibitor, and dipeptidyl peptidase 4 (DPP4) inhibitor users, with 54 deaths among 2,433 semaglutide users (2.2%) versus 395 deaths among 2,433 comparators (16.2%) over 24 months (log-rank P < 0.001). Tirzepatide showed a favorable survival association relative to pooled anti-diabetic comparators that did not meet statistical significance (P = 0.24), with 3 deaths among 220 users (1.4%) versus 64 deaths among 220 comparators (29.1%). In a head-to-head propensity-score-matched comparison, overall survival did not differ significantly between semaglutide- and tirzepatide-treated patients with pre-existing breast cancer (2,117 per arm; P = 0.12). In semaglutide-treated patients alive and observable at the 1-year landmark, higher maximum dose achieved was significantly associated with lower post-landmark mortality (P = 0.034), with an event rate of approximately 1.0% in the high-dose group ([≥]1.7 mg) versus approximately 4.5% in the low-dose group (0.25-1.0 mg). Despite a linear dose-weight loss relationship for semaglutide, however, weight-loss strata did not separate survival outcomes (global P = 0.22). In tirzepatide-treated patients alive and observable at the same landmark, neither maximum dose achieved nor weight-loss strata separated post-landmark survival (P = 0.98 and P = 0.50, respectively). Structured EHR and AI-based clinical-note analyses further showed significantly lower frequency of documented metastatic disease in semaglutide-treated patients relative to pooled anti-diabetic comparators, including any metastasis (7.0% versus 15.0%, rate ratio 0.5, P < 0.001), bone metastasis (1.0% versus 5.2%, rate ratio 0.2, P < 0.001), and liver, lung, or brain metastases (all P < 0.001). LLM-derived cause-of-death extraction further showed a 60% lower relative proportion of cancer-associated deaths in semaglutide-treated patients (19% of ascertainable deaths) than in matched pooled anti-diabetic comparators (47% of ascertainable deaths), with comparator deaths more often attributed to cancer progression involving metastatic breast cancer, leptomeningeal carcinomatosis, and cancer-driven organ failure. Overall, this study demonstrates that semaglutide use in patients with pre-existing breast cancer is associated with a dose-correlated but weight-loss independent improvement in overall survival. These findings motivate prospective trials of GLP-1 receptor agonists in breast cancer across various stages and treatment settings.

19

Targeted BRCA1/BRCA2 Sequencing in a Bangladeshi Clinically Referred Cohort Identifies Candidate BRCA1 Loss-of-Function Variants and a Multi-Exon Deletion-Like CNV Signal

Al Sium, S. M.; Banu, T. A.; Goswami, B.; Naser, S. R.; Habib, M. A.; Akter, S.; Ara, M. H.; Al Din, S. M. S.; Nafisa, A.; Nayem, M. R.; Rabbi, M. F. A.; Sarkar, M. M. H.; Khan, M. S.

2026-05-20 oncology 10.64898/2026.05.11.26352643 medRxiv

Top 0.1%

3.8%

Show abstract

Background: Population-relevant BRCA1/BRCA2 data from Bangladesh are scarce, creating challenges for hereditary breast and ovarian cancer variant interpretation, counseling, and follow-up testing. We examined a clinically referred Bangladeshi cohort to characterize assay-derived BRCA1/BRCA2 short variants, sequencing-depth performance, and copy-number findings in a conservative pilot framework. Methods: Twenty-three de-identified blood-derived DNA samples were assessed using a targeted BRCA1/BRCA2 next-generation sequencing workflow. Downstream analysis used assay-generated short-variant, coverage, and CNV outputs, with coordinates reported on hg19/GRCh37. Short variants were evaluated from high-confidence PASS/VCC-H calls, and CNV review incorporated both target-region and amplicon-level copy-number patterns. Results: After removal of four low-VAF review observations, the primary germline-compatible dataset comprised 304 short-variant observations representing 34 unique variants. Both BRCA1 and BRCA2 contributed comparable variant burdens, while the overall profile was mainly composed of missense and synonymous changes. Six sample-specific heterozygous BRCA1 truncating candidates were observed, including five frameshift variants and one stop-gain variant. Protein-level mapping placed these events across the central-to-C-terminal portion of BRCA1. Sequencing depth was consistently high across the targeted regions, with all 4,255 amplicon-sample measurements exceeding 280x and 99.91% reaching at least 500x. Copy-number analysis highlighted one candidate BRCA1 multi-exon deletion-like event involving exons 15-20 in BCSIR-BRCA-21, with unresolved partial exon 14 involvement. Conclusions: This study provides an initial Bangladesh-focused targeted BRCA1/BRCA2 dataset and identifies candidate short-variant and CNV findings for validation. These findings should be interpreted as analytical candidates only and require confirmatory testing and expert clinical curation before any clinical application. The cohort is referral-enriched and should not be used to infer population prevalence.

20

CBCRisk-Mastectomy: A Risk Prediction Tool to Aid Contralateral Prophylactic Mastectomy Decision Making

Sajal, I. H.; Pfeiffer, R. M.; Jatoi, I.; Gail, M. H.; Cecchini, R. S.; Choudhary, P. K.; Biswas, S.

2026-05-15 surgery 10.64898/2026.05.12.26352924 medRxiv

Top 0.1%

3.7%

Show abstract

Purpose: Unilateral breast cancer (BC) patients scheduled for mastectomy often choose to undergo contralateral prophylactic mastectomy (CPM), despite substantial declines in contralateral breast cancer (CBC) risk in recent decades. Models predicting absolute risk of future CBC can aid informed decision-making about CPM. CBCRisk is an existing CBC absolute risk prediction model trained on unilateral BC patients regardless of whether they had mastectomy. Here we developed CBCRisk-Mastectomy, tailored specifically to BC patients scheduled for mastectomy and considering CPM. Patients and Methods: We used data on BC patients who underwent mastectomy to treat their first BC from two nationally representative sources: Breast Cancer Surveillance Consortium (BCSC) and Surveillance, Epidemiology, and End Results (SEER) cancer registry. We imputed missing data in the BCSC sample and used conditional logistic regression models, trained on 2,660 BC patients (665 CBC cases) from BCSC, to identify predictors and estimate relative risks (RRs). These were combined with attributable risks and CBC incidence rates estimated from SEER to obtain absolute risk. Cross-validation was used to internally validate CBCRisk-Mastectomy and compare with CBCRisk. Results: CBCRisk-Mastectomy has nine predictors: first BC type, lobular carcinoma in situ status, estrogen receptor status, tumor stage, breast density, age at BC diagnosis, family history of BC, age at first birth, and body mass index. The areas under the curve and their 95% confidence intervals for 5-year predictions for CBCRisk-Mastectomy and CBCRisk were 0.62 (0.59, 0.65) and 0.58 (0.55, 0.61), respectively. Conclusions: CBCRisk-Mastectomy may aid clinicians in counseling BC patients scheduled for mastectomy, enabling improved decision-making regarding CPM.